Stack Exchange Data Dump
There Is No Preview Available For This Item
This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files
All user content contributed to the Stack Exchange network is cc-by-sa 3.0 licensed, intended to be shared and remixed. We even provide all our data as a convenient data dump.
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).Specifically the attribution requirements are as follows:
- Visually display or otherwise indicate the source of the content as coming from the Stack Exchange Network. This requirement is satisfied with a discreet text blurb, or some other unobtrusive but clear visual indication.
- Ensure that any Internet use of the content includes a hyperlink directly to the original question on the source site on the Network (e.g., http://stackoverflow.com/questions/12345)
- Visually display or otherwise clearly indicate the author names for every question and answer used
For more information, see the Stack Exchange Terms of Service.
Subject: information about data
Subject: No files available
Subject: Access restricted
Subject: XML file into Stata
Subject: Not available for download
Subject: Data Error
Subject: stackoverflow.com-PostLinks.7z corrupted
Subject: File size limit
I can not use torrent (I am in Cuba)
The only option for us is to download file with wget.
However, when we try to download full stack exchange data dump we have a message:
"total size of requested files (44 GB) is too large for zip-on-the-fly"
Can I ask please to remove this limitation?
Subject: Great data set
Also note: Info about sites and their corresponding file names can be indirectly obtained via the /sites API query (http://api.stackexchange.com/docs/sites): For every site, the file name will match either the primary URL of that site *or* the URL of one of that site's aliases. There are no exceptions beyond SO (whose dump is uniquely split among multiple files).
Sites that are very new relative to the archive date are not included in the dump.
Subject: Better filenames
Subject: Torrent out of date
Subject: .torrent is fixed
If you want to parse these large XML files, you need to use a streaming parser. I'm not surprised that an XML editor would have problems with a 40 gigabyte XML file! Most XML parsing software libraries have a streaming option.
Subject: How to open posts.xml file?
Subject: What sense do we make of the files?
I cannot download the torrent version either - can someone help make sense of the files?
Subject: torrent out of date
appears to link to an older version of the data dump. For stackoverflow the latest post I saw was from 2014.
However, if the .zip file
appears to have data upto august 2015.
Subject: The current dump is old from 2014-09
Subject: This hit the shuffle.php bug
Subject: Missing File
Does anyone know of a source for this file?
Subject: for those which can't make bittorent working
Subject: Bittorrent download broken
Subject: Thanks for sharing
Subject: April data
I see only March data. How can I get April data?
What about January and February?
But then I tried when I logged in, and I was able to download the whole file :-)
this should be made avail via ftp!!!
Subject: File is broken
Torrent stucks on 70.8%.
Can anyone help to get this file?
Subject: No seed?
Subject: Seeds for 3/16/15 version?
Subject: Thanks and tests
Fun to see how small the whole SE network is after all, only few GB compressed. Wikimedia projects dumps compress very well too, but they're still much bigger (while fitting a common hard disk anyway!).
Subject: Latest Dump.
Subject: Really Cool
Uploaded by Stack Exchange on